253 research outputs found
The Distributional Learning of Multi-Word Expressions: A Computational Approach
There has been much recent research in corpus and computational linguistics on distributional learning algorithms—computer code that induces latent linguistic structures in corpus data based on co-occurrences of transcribed units in that data. These algorithms have varied applications, from the investigation of human cognitive processes to the corpus extraction of relevant linguistic structures for lexicographic, second language learning, or natural language processing applications, among others. They also operate at various levels of linguistic structure, from phonetics to syntax. One area of research on distributional learning algorithms in which there remains relatively little work is the learning of multi-word, memorized, formulaic sequences, based on the co-occurrences of words. Examples of such multi-word expressions (MWEs) include kick the bucket, New York City, sit down, and as a matter of fact. In this dissertation, I present a novel computational approach to the distributional learning of such sequences in corpora. Entitled MERGE (Multi-word Expressions from the Recursive Grouping of Elements), my algorithm iteratively works by (1) assigning a statistical ‘attraction’ score to each two-word sequence (bigram) in a corpus, based on the individual and co-occurrence frequencies of these two words in that corpus; and (2) merging the highest-scoring bigram into a single, lexicalized unit. These two steps then repeat until some maximum number of iterations or minimum score threshold is reached (since, broadly speaking, the winning score progressively decreases with increasing iterations). Because one (or both) of the ‘words’ making up a winning bigram may be an output merged item from a previous iteration, the algorithm is able to learn MWEs that are in principle of any length (e.g., apple pie versus I’ll believe it when I see it). Moreover, these MWEs may contain one or more discontinuities of different sizes, up to some maximum size threshold (measured in words) specified by the user (e.g., as _ as in as tall as and as big as). Typically, the extraction of MWEs has been handled by algorithms that identify only continuous sequences, and in which the user must specify the length(s) of the sequences to be extracted beforehand; thus, MERGE offers a bottom-up, distributional-based approach that addresses these issues.In the present dissertation, in addition to describing the algorithm, I report three rating experiments and one corpus-based early child language study that validate the efficacy of MERGE in identifying MWEs. In one experiment, participants rate sequences extracted from a corpus by the algorithm for how well they instantiate true MWEs. As expected, the results reveal that the high-scoring output items that MERGE identifies early in its iterative process are rated as ‘good’ MWEs by participants (based on certain subjective criteria), with the quality of these ratings decreasing for output from later iterations (i.e., output items that were scored lower by the algorithm). In the other two experiments, participants rate high-ranking output both from MERGE and from an existing algorithm from the literature that also learns MWEs of various lengths—the Adjusted Frequency List (Brook O’Donnell 2011). Comparison of participant ratings reveals that the items that MERGE acquires are rated more highly than those acquired by the Adjusted Frequency List, suggesting that MERGE is a performance frontrunner among distributional learning algorithms of MWEs. More broadly, together the experiments suggest that MERGE acquires representations that are compatible with adult knowledge of formulaic language, and thus it may be useful for any number of research applications that rely on such formulaic language as a unit of analysis.Finally, in a study using two corpora of caregiver-child interactions, I run MERGE on caregiver utterances and then show that, of the MWEs induced by the algorithm, those that go on to be later acquired by the children receive higher scores by the algorithm than those that do not go on to be learned. These results suggest that, when applied to acquisition data, the algorithm is useful for identifying the structures of statistical co-occurrences in the caregiver input that are relevant to children in their acquisition of early multi-word knowledge.Overall, MERGE is shown to be a powerful computational approach to the distributional learning and extraction of MWEs, both when modeling adult knowledge of formulaic language, and when accounting for the early multi-word structures acquired by children
The Long-Baseline Neutrino Experiment: Exploring Fundamental Symmetries of the Universe
The preponderance of matter over antimatter in the early Universe, the
dynamics of the supernova bursts that produced the heavy elements necessary for
life and whether protons eventually decay --- these mysteries at the forefront
of particle physics and astrophysics are key to understanding the early
evolution of our Universe, its current state and its eventual fate. The
Long-Baseline Neutrino Experiment (LBNE) represents an extensively developed
plan for a world-class experiment dedicated to addressing these questions. LBNE
is conceived around three central components: (1) a new, high-intensity
neutrino source generated from a megawatt-class proton accelerator at Fermi
National Accelerator Laboratory, (2) a near neutrino detector just downstream
of the source, and (3) a massive liquid argon time-projection chamber deployed
as a far detector deep underground at the Sanford Underground Research
Facility. This facility, located at the site of the former Homestake Mine in
Lead, South Dakota, is approximately 1,300 km from the neutrino source at
Fermilab -- a distance (baseline) that delivers optimal sensitivity to neutrino
charge-parity symmetry violation and mass ordering effects. This ambitious yet
cost-effective design incorporates scalability and flexibility and can
accommodate a variety of upgrades and contributions. With its exceptional
combination of experimental configuration, technical capabilities, and
potential for transformative discoveries, LBNE promises to be a vital facility
for the field of particle physics worldwide, providing physicists from around
the globe with opportunities to collaborate in a twenty to thirty year program
of exciting science. In this document we provide a comprehensive overview of
LBNE's scientific objectives, its place in the landscape of neutrino physics
worldwide, the technologies it will incorporate and the capabilities it will
possess.Comment: Major update of previous version. This is the reference document for
LBNE science program and current status. Chapters 1, 3, and 9 provide a
comprehensive overview of LBNE's scientific objectives, its place in the
landscape of neutrino physics worldwide, the technologies it will incorporate
and the capabilities it will possess. 288 pages, 116 figure
LSST: from Science Drivers to Reference Design and Anticipated Data Products
(Abridged) We describe here the most ambitious survey currently planned in
the optical, the Large Synoptic Survey Telescope (LSST). A vast array of
science will be enabled by a single wide-deep-fast sky survey, and LSST will
have unique survey capability in the faint time domain. The LSST design is
driven by four main science themes: probing dark energy and dark matter, taking
an inventory of the Solar System, exploring the transient optical sky, and
mapping the Milky Way. LSST will be a wide-field ground-based system sited at
Cerro Pach\'{o}n in northern Chile. The telescope will have an 8.4 m (6.5 m
effective) primary mirror, a 9.6 deg field of view, and a 3.2 Gigapixel
camera. The standard observing sequence will consist of pairs of 15-second
exposures in a given field, with two such visits in each pointing in a given
night. With these repeats, the LSST system is capable of imaging about 10,000
square degrees of sky in a single filter in three nights. The typical 5
point-source depth in a single visit in will be (AB). The
project is in the construction phase and will begin regular survey operations
by 2022. The survey area will be contained within 30,000 deg with
, and will be imaged multiple times in six bands, ,
covering the wavelength range 320--1050 nm. About 90\% of the observing time
will be devoted to a deep-wide-fast survey mode which will uniformly observe a
18,000 deg region about 800 times (summed over all six bands) during the
anticipated 10 years of operations, and yield a coadded map to . The
remaining 10\% of the observing time will be allocated to projects such as a
Very Deep and Fast time domain survey. The goal is to make LSST data products,
including a relational database of about 32 trillion observations of 40 billion
objects, available to the public and scientists around the world.Comment: 57 pages, 32 color figures, version with high-resolution figures
available from https://www.lsst.org/overvie
Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names
Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences
Phosphoinositide-3 Kinase-Akt Pathway Controls Cellular Entry of Ebola Virus
The phosphoinositide-3 kinase (PI3K) pathway regulates diverse cellular activities related to cell growth, migration, survival, and vesicular trafficking. It is known that Ebola virus requires endocytosis to establish an infection. However, the cellular signals that mediate this uptake were unknown for Ebola virus as well as many other viruses. Here, the involvement of PI3K in Ebola virus entry was studied. A novel and critical role of the PI3K signaling pathway was demonstrated in cell entry of Zaire Ebola virus (ZEBOV). Inhibitors of PI3K and Akt significantly reduced infection by ZEBOV at an early step during the replication cycle. Furthermore, phosphorylation of Akt-1 was induced shortly after exposure of cells to radiation-inactivated ZEBOV, indicating that the virus actively induces the PI3K pathway and that replication was not required for this induction. Subsequent use of pseudotyped Ebola virus and/or Ebola virus-like particles, in a novel virus entry assay, provided evidence that activity of PI3K/Akt is required at the virus entry step. Class 1A PI3Ks appear to play a predominant role in regulating ZEBOV entry, and Rac1 is a key downstream effector in this regulatory cascade. Confocal imaging of fluorescently labeled ZEBOV indicated that inhibition of PI3K, Akt, or Rac1 disrupted normal uptake of virus particles into cells and resulted in aberrant accumulation of virus into a cytosolic compartment that was non-permissive for membrane fusion. We conclude that PI3K-mediated signaling plays an important role in regulating vesicular trafficking of ZEBOV necessary for cell entry. Disruption of this signaling leads to inappropriate trafficking within the cell and a block in steps leading to membrane fusion. These findings extend our current understanding of Ebola virus entry mechanism and may help in devising useful new strategies for treatment of Ebola virus infection
Taxonomy of the order Mononegavirales : update 2016
In 2016, the order Mononegavirales was emended through the addition of two new families (Mymonaviridae and Sunviridae), the elevation of the paramyxoviral subfamily Pneumovirinae to family status (Pneumoviridae), the addition of five free-floating genera (Anphevirus, Arlivirus, Chengtivirus, Crustavirus, and Wastrivirus), and several other changes at the genus and species levels. This article presents the updated taxonomy of the order Mononegavirales as now accepted by the International Committee on Taxonomy of Viruses (ICTV)
Non-invasive cardiac assessment in high risk patients (The GROUND study): rationale, objectives and design of a multi-center randomized controlled clinical trial
Background: Peripheral arterial disease (PAD) is a common disease associated with a considerably increased risk of future cardiovascular events and most of these patients will die from coronary artery disease (CAD). Screening for silent CAD has become an option with recent non-invasive developments in CT (computed tomography)-angiography and MR (magnetic resonance) stress testing. Screening in combination with more aggressive treatment may improve prognosis. Therefore we propose to study whether a cardiac imaging algorithm, using non-invasive imaging techniques followed by treatment will reduce the risk of cardiovascular disease in PAD patients free from cardiac symptoms. Design: The GROUND study is designed as a prospective, multi-center, randomized clinical trial. Patients with peripheral arterial disease, but without symptomatic cardiac disease will be asked to participate. All patients receive a proper risk factor management before randomization. Half of the recruited patients will enter the 'control group' and only undergo CT calcium scoring. The other half of the recruited patients (index group) will undergo the non invasive cardiac imaging algorithm followed by evidence-based treatment. First, patients are submitted to CT calcium scoring and CT angiography. Patients with a left main (or equivalent) coronary artery stenosis of > 50% on CT will be referred to a cardiologist without further imaging. All other patients in this group will undergo dobutamine stress magnetic resonance (DSMR) testing. Patients with a DSMR positive for ischemia will also be referred to a cardiologist. These patients are candidates for conventional coronary angiography and cardiac interventions (coronary artery bypass grafting (CABG) or percutaneous cardiac interventions (PCI)), if indicated. All participants of the trial will enter a 5 year follow up period for the occurrence of cardiovascular events. Sequential interim analysis will take place. Based on sample size calculations about 1200 patients are needed to detect a 24% reduction in primary outcome. Implications: The GROUND study will provide insight into the question whether non-invasive cardiac imaging reduces the risk of cardiovascular events in patients with peripheral arterial disease, but without symptoms of coronary artery disease. Trial registration: Clinicaltrials.gov NCT0018911
Virus nomenclature below the species level : a standardized nomenclature for laboratory animal-adapted strains and variants of viruses assigned to the family Filoviridae
The International Committee on Taxonomy of Viruses (ICTV) organizes the classification of
viruses into taxa, but is not responsible for the nomenclature for taxa members. International
experts groups, such as the ICTV Study Groups, recommend the classification and naming of
viruses and their strains, variants, and isolates. The ICTV Filoviridae Study Group has recently
introduced an updated classification and nomenclature for filoviruses. Subsequently, and
together with numerous other filovirus experts, a consistent nomenclature for their natural
genetic variants and isolates was developed that aims at simplifying the retrieval of sequence
data from electronic databases. This is a first important step toward a viral genome annotation
standard as sought by the US National Center for Biotechnology Information (NCBI). Here, this
work is extended to include filoviruses obtained in the laboratory by artificial selection through
passage in laboratory hosts. The previously developed template for natural filovirus genetic
variant naming ( //<year of
sampling>/-) is retained, but it is proposed to
adapt the type of information added to each field for laboratory animal-adapted variants. For
instance, the full-length designation of an Ebola virus Mayinga variant adapted at the State
Research Center for Virology and Biotechnology “Vector” to cause disease in guinea pigs after
seven passages would be akin to “Ebola virus VECTOR/C.porcellus-lab/COD/1976/Mayinga-
GPA-P7”. As was proposed for the names of natural filovirus variants, we suggest using the fulllength
designation in databases, as well as in the method section of publications. Shortened
designations (such as “EBOV VECTOR/C.por/COD/76/May-GPA-P7”) and abbreviations (such
as “EBOV/May-GPA-P7”) could be used in the remainder of the text depending on how critical it is to convey information contained in the full-length name. “EBOV” would suffice if only one
EBOV strain/variant/isolate is addressed.This work was funded in part by the Joint Science and Technology Office for Chem Bio Defense (proposal #TMTI0048_09_RD_T to SB).http://www.springerlink.com/content/0304-8608/hb2013ab201
Virus nomenclature below the species level : a standardized nomenclature for filovirus strains and variants rescued from cDNA
Specific alterations (mutations, deletions,
insertions) of virus genomes are crucial for the functional
characterization of their regulatory elements and their expression products, as well as a prerequisite for the creation
of attenuated viruses that could serve as vaccine
candidates. Virus genome tailoring can be performed either
by using traditionally cloned genomes as starting materials,
followed by site-directed mutagenesis, or by de novo synthesis
of modified virus genomes or parts thereof. A systematic
nomenclature for such recombinant viruses is
necessary to set them apart from wild-type and laboratoryadapted
viruses, and to improve communication and collaborations
among researchers who may want to use
recombinant viruses or create novel viruses based on them.
A large group of filovirus experts has recently proposed
nomenclatures for natural and laboratory animal-adapted
filoviruses that aim to simplify the retrieval of sequence
data from electronic databases. Here, this work is extended
to include nomenclature for filoviruses obtained in the
laboratory via reverse genetics systems. The previously
developed template for natural filovirus genetic variant
naming,\virus name[(\strain[/)\isolation host-suffix[/
\country of sampling[/\year of sampling[/\genetic
variant designation[-\isolate designation[, is retained, but we propose to adapt the type of information added to each
field for cDNA clone-derived filoviruses. For instance, the
full-length designation of an Ebola virus Kikwit variant
rescued from a plasmid developed at the US Centers for
Disease Control and Prevention could be akin to ‘‘Ebola
virus H.sapiens-rec/COD/1995/Kikwit-abc1’’ (with the
suffix ‘‘rec’’ identifying the recombinant nature of the virus
and ‘‘abc1’’ being a placeholder for any meaningful isolate
designator). Such a full-length designation should be used
in databases and the methods section of publications.
Shortened designations (such as ‘‘EBOV H.sap/COD/95/
Kik-abc1’’) and abbreviations (such as ‘‘EBOV/Kik-abc1’’)
could be used in the remainder of the text, depending on
how critical it is to convey information contained in the
full-length name. ‘‘EBOV’’ would suffice if only one
EBOV strain/variant/isolate is addressed.http://link.springer.com/journal/705hb201
Sex differences in cerebral venous sinus thrombosis after adenoviral vaccination against COVID-19
Introduction: Cerebral venous sinus thrombosis associated with vaccine-induced immune thrombotic thrombocytopenia (CVST-VITT) is a severe disease with high mortality. There are few data on sex differences in CVST-VITT. The aim of our study was to investigate the differences in presentation, treatment, clinical course, complications, and outcome of CVST-VITT between women and men. Patients and methods: We used data from an ongoing international registry on CVST-VITT. VITT was diagnosed according to the Pavord criteria. We compared the characteristics of CVST-VITT in women and men. Results: Of 133 patients with possible, probable, or definite CVST-VITT, 102 (77%) were women. Women were slightly younger [median age 42 (IQR 28–54) vs 45 (28–56)], presented more often with coma (26% vs 10%) and had a lower platelet count at presentation [median (IQR) 50x109/L (28–79) vs 68 (30–125)] than men. The nadir platelet count was lower in women [median (IQR) 34 (19–62) vs 53 (20–92)]. More women received endovascular treatment than men (15% vs 6%). Rates of treatment with intravenous immunoglobulins were similar (63% vs 66%), as were new venous thromboembolic events (14% vs 14%) and major bleeding complications (30% vs 20%). Rates of good functional outcome (modified Rankin Scale 0-2, 42% vs 45%) and in-hospital death (39% vs 41%) did not differ. Discussion and conclusions: Three quarters of CVST-VITT patients in this study were women. Women were more severely affected at presentation, but clinical course and outcome did not differ between women and men. VITT-specific treatments were overall similar, but more women received endovascular treatment.</p
- …